Coping with imbalanced data problem in digital mapping of soil classes

نویسندگان

چکیده

Abstract An unsolved problem in the digital mapping of categorical soil variables and types is imbalanced number observations, which leads to reduced accuracy loss minority class (the with a significantly lower observations compared other classes) final map. So far, synthetic over‐ under‐sampling techniques have been explored science; however, more efficient approaches that do not drawbacks these guarantee retention classes produced map are essentially required. Such suggested present study for include machine learning models ensemble gradient boosting, cost‐sensitive one‐class classification (OCC) combined multi‐class classification. In this regard, extreme boosting (XGB) as an learner, decision tree (CSDT) within C5.0 algorithm, support vector (OCCM) were investigated eight great groups naturally frequency northwest Iran. A total 453 profile data points used area. split was done manually each separately, resulted overall 70% calibration 30% validation. The bootstrapping approach (with 10 runs) performed produce multiple maps model. bootstraps evaluated against hold‐out validation dataset. average values measures, including Kappa (K), (OA), producer's (PA) user's (UA), explored. addition, results previous same area, resampling deal mapping. findings show all three methods can well problem, OCCM showing highest K (= 0.76) OA 82) stage. Also, model Comparing demonstrates newly remarkably increase both individual

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

On digital soil mapping

We review various recent approaches to making digital soil maps based on geographic information systems (GIS) data layers, note some commonalities and propose a generic framework for the future. We discuss the various methods that have been, or could be, used for fitting quantitative relationships between soil properties or classes and their ‘environment’. These include generalised linear model...

متن کامل

focus on communication in iranian high school language classes: a study of the role of teaching materials in changing the focus onto communication in language classes

چکیده ارتباط در کلاس به عوامل زیادی از جمله معلمان، دانش آموزان، برنامه های درسی و از همه مهم تر، مواد آموزشی وابسته است. در تدریس ارتباطی زبان که تاکید زیادی بر توانش ارتباطی دارد، کتاب درسی به عنوان عامل موثر بر پویایی کلاس محسوب میگردد که درس ها را از طریق فراهم آوردن متن ارتباط کلاسی و هم چنین نوع تمرین زبانی که دانش آموزان در طول فعالیت های کلاسی به آن مشغول اند، کنترل می کند. این حقیقت ک...

15 صفحه اول

Renewal of the Hungarian Soil Spatial Data Infrastructure by goal oriented digital soil mapping

The DOSoReMI.hu (Digital, Optimized, Soil Related Maps and Information in Hungary) project was started intentionally for the renewal of the national soil spatial infrastructure in Hungary. During our activities we have significantly extended the potential, how spatial soil information requirements could be satisfied. Soil property, soil type as well as functional soil maps were compiled. The se...

متن کامل

Combining Feature Subset Selection and Data Sampling for Coping with Highly Imbalanced Software Data

In the software quality modeling process, many practitioners often ignore problems such as high dimensionality and class imbalance that exist in data repositories. They directly use the available set of software metrics to build classification models without regard to the condition of the underlying software measurement data, leading to a decline in prediction performance and extension of train...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: European Journal of Soil Science

سال: 2023

ISSN: ['1365-2389', '1351-0754']

DOI: https://doi.org/10.1111/ejss.13368